Using Template Exploration for Large-Scale Machine Learning

ABSTRACT

Systems and techniques are provided for template exploration in a large-scale machine learning system. A method may include obtaining multiple base templates, each base template comprising multiple features. A template performance score may be obtained for each base template and a first base template may be selected from among the multiple base templates based on the template performance score of the first base template. Multiple cross-templates may be constructed by generating a cross-template of the selected first base template and each of the multiple base templates. Performance of a machine learning model may be tested based on each cross-template to generate a cross-template performance score for each of the cross-templates. A first cross-template may be selected from among the multiple cross-templates based on the cross-template performance score of the cross-template. Accordingly, the first cross-template may be added to the machine learning model.

BACKGROUND

Large-scale data processing may include extracting data of interest fromraw data in one or more databases and processing it into a data product.For example, regression analysis may be conducted based on a very largedataset and includes statistical processes for estimating therelationships among variables. It may be used to predict or forecast agiven action or event and may be based on analyzing historical or testdata containing variables that contribute to the prediction andforecasting. As a specific example, large-scale machine learning systemsmay process large amounts of training data from data streams received bythe system. A data stream may include training examples corresponding tospecific instances of an event or action such as when a user selects aspecific search result, or when a single video is viewed from amongmultiple videos presented to a user. An example may contain features(i.e., observed properties such as a user being located in the USA, auser preferring to speak English, etc.) and may also contain a labelwhich may indicate an event or action associated with the example (e.g.,a user selected a specific search result, a user did not select aspecific search result, a user viewed a particular video, etc.). Theseexamples may be used to generate statistics for each of the features andthese statistics may be used to generate a model. As a result, a machinelearning system may use this model to make predictions.

BRIEF SUMMARY

According to an embodiment of the disclosed subject matter, a method mayinclude obtaining a plurality of base templates, each base templatecomprising a plurality of features. A template performance score may beobtained for each base template. A first base template may be selectedfrom the plurality of base templates based on the template performancescore of the first base template. A first plurality of cross-templatesmay be constructed by generating a cross-template of the selected firstbase template and at least one of the plurality of base templates. Theperformance of a machine learning model may be tested based on each ofthe first plurality of cross-templates to generate a cross-templateperformance score for each of the first plurality of cross-templates.Next, a first cross-template of the first plurality of cross-templatesmay be selected based on the cross-template performance score of thefirst cross-template and the first cross-template may be added to themachine learning model.

An implementation of the disclosed subject matter provides a systemincluding a processor configured to obtain a plurality of basetemplates, each base template comprising a plurality of features. Atemplate performance score may be obtained for each base template. Afirst base template from the plurality of base templates may be selectedbased on the template performance score of the first base template. Afirst plurality of cross-templates may be constructed by generating across-template of the selected first base template and at least one ofthe plurality of base templates. The performance of a machine learningmodel may be tested based on each of the first plurality ofcross-templates to generate a cross-template performance score for eachof the first plurality of cross-templates. Next, a first cross-templateof the first plurality of cross-templates may be selected based on thecross-template performance score of the first cross-template and thefirst cross-template may be added to the machine learning model.

Implementations of the disclosed subject matter provide templateexploration techniques for use in large-scale machine learning. Becauselarge-scale machine learning systems process large amounts of trainingdata, e.g., features, techniques for improving model generation based onthese features may be very helpful. By crossing templates of featuresfor use in machine learning model generation, the overall performance ofsuch systems may be improved. Additional features, advantages, andembodiments of the disclosed subject matter may be set forth or apparentfrom consideration of the following detailed description, drawings, andclaims. Moreover, it is to be understood that both the foregoing summaryand the following detailed description are examples and are intended toprovide further explanation without limiting the scope of the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a furtherunderstanding of the disclosed subject matter, are incorporated in andconstitute a part of this specification. The drawings also illustrateembodiments of the disclosed subject matter and together with thedetailed description serve to explain the principles of embodiments ofthe disclosed subject matter. No attempt is made to show structuraldetails in more detail than may be necessary for a fundamentalunderstanding of the disclosed subject matter and various ways in whichit may be practiced.

FIG. 1 shows an example process according to an implementation of thedisclosed subject matter.

FIG. 2 shows an example template exploration technique according to animplementation of the disclosed subject matter.

FIG. 3 shows a computer according to an embodiment of the disclosedsubject matter.

FIG. 4 shows a network configuration according to an embodiment of thedisclosed subject matter.

DETAILED DESCRIPTION

In general, large-scale data processing systems process large amounts ofdata from various sources and/or machines. As a specific example,large-scale machine learning systems may process large amounts oftraining data from data streams received by the system. A data streammay include training examples corresponding to specific instances of anevent or action such as when a user selects a specific search result, orwhen a single video is viewed from among multiple videos presented to auser. An example may contain features (i.e., observed properties such asa user being located in the USA, a user preferring to speak English,etc.) and may also contain a label which may indicate (e.g., positive ornegative) the occurrence of an event or action associated with theexample (e.g., a user selected a specific search result, a user did notselect a specific search result, a user viewed a particular video,etc.).

A machine learning system may contain one or more learners. A learnermay include numerous workers such as a mapper or a reducer. A singlemapper may receive examples from multiple shards. As an example, a firstmapper may receive example A and example B from a data stream. Bothexamples may contain features F1, F2, and F3. The mapper may generate afirst statistic (e.g., based on a label indicating that a user selecteda search result) for F1 based on example A and a second statistic (e.g.,based on a label indicating that a user selected a search result) for F1based on example B. More specifically, the mapper may indicate a +1 forF1 based on example A and a +1 for F1 based on example B. The twostatistics for F1 (i.e., +1 and +1) may be combined at the mapper,resulting in an overall mapper statistic ‘MS1’ for F1 of +2. Similarly,a different mapper may also receive examples from other data streams,and generate an overall mapper statistic ‘MS2’ for F1 of +4 based on therespective examples in those data streams.

The overall mapper statistics (e.g., MS1 and MS2) for F1 may be providedto a reducer R1. The reducer R1 may be configured to collect overallmapper statistics from two or more mappers within the learner andgenerate a weight based on the collected mapper statistics. The reducerR1 may collect MS1 (i.e., +2) and may also collect MS2 (i.e., +4) andgenerate the weight +6. Similarly, a second reducer R2 may receiveoverall mapper statistics for feature F2 and generate a weight of −3.The reducers may provide the weights to a model such that the modelcontains at least the following:

Model: +6(F1)−3(F2) . . .

As described above, a machine learning system may receive and process100s of billions of training examples, each example including multiplefeatures. These 100s of billions of features may be used to generate amodel, as shown above, and a machine learning model may be used to makepredictions based on statistics associated with features in the model.Many machine learning algorithms use a variety of feature explorationtechniques to produce more expressive models that can better capturepatterns in the training data. However, when there are billions offeatures and billions of examples, standard feature explorationtechniques do not scale well. Standard feature exploration techniques,such as those used by boosting algorithms, often add a small number ofnew features at a time to a model. This technique, however, does notscale well to machine learning models that may contain billions offeatures. As such, it may be advantageous to select some groups offeatures, from among the billions of features present in training data,to be included in the machine learning model.

The present disclosure provides techniques based on feature templatesand template exploration. A template may be a category of feature-typesand a template may include multiple features, all of which are from thesame category. A template may be a single category of features (e.g., abase template) or multiple categories of features (e.g., across-template). A specific type of template may be a base template thatis a single category of features. For example, a base template may be“language” and the features included in the template may be English,Spanish, French, German, Hindi, Italian, Japanese, and the like. Each ofthese features may be associated with a previous example that wasreceived by the system. As another example, a base template may be“country” and the features included in the template may include UnitedStates, Canada, France, United Kingdom, Mexico, Japan, India, Italy,China, Australia, and the like. Yet another example may be the basetemplate “keyword” that may include features such as “keyword:free” and“keyword:books”. According to the present disclosure, a cross-templatemay also be constructed. A cross-template may be another special type oftemplate that is a cross of two or more base templates. A cross-templatemay be constructed from a combination of templates such as “country Xkeyword” which will include features such as “US X books” and “France Xfree”. In machine learning models that may contain 100s of billions offeatures, a well-performing model may have, for example, 100 or moretotal templates, many of which may be cross-templates containing 3 ormore combinations of templates. Since exploring the space of all featuretemplates is infeasible, it is necessary to efficiently explore thespace of templates based on estimating the gain of a cross-templatecontaining a combination of multiple templates. This technique may beused to optimize performance of a machine learning model by using agreedy strategy for selecting templates and cross-templates to includein the model.

In general, a greedy strategy for generating a cross-template to includein a machine learning model may have multiple components. First, theremay be a candidate set of templates based on which a cross-template maybe created. There may be a selection technique used to identify the bestcandidate template which may be selected and added to thecross-template. The machine learning model may be tested using thecandidate cross-template to determine if the candidate cross-templatecontributes to the performance of the machine learning model. Forexample, a cross-template performance score may be generated based onthe performance of the machine learning model including the candidatecross-template. Based on the cross-template performance score, thecandidate cross-template may be added to the machine learning model andmay result in improved predictions by the machine learning system.

As a specific example, a machine learning system may be exploring 3 basetemplates {A, B, C} to be included in an empty machine learning model.Beginning with the empty model, first, the base templates {A, B, C} mayeach be scored. Base template B may be the highest scoring template and,as a result, may be added to the model. In the next round of templateexploration, all base templates not added to the model plus all possibleextensions (e.g., by one more base template) to the templates in themodel may be explored. In this case the templates that may be scored mayinclude {A, C, A X B, B X C}. Out of these templates, template B X C maybe the highest and may be added to the model. As a result, the nextround of template exploration may score the templates {A, C, A X B, A XB X C}. As such, each iteration for selecting a template may includescoring one or more base templates, and one or more cross-templates(i.e., which include two or more base templates).

Implementations disclosed herein provide methods and systems for usingtemplate exploration for large-scale machine learning. FIG. 1 shows anexample process according to an implementation of the disclosed subjectmatter. A system may include a processor configured to obtain multiplebase templates and each base template may include multiple features, at101. A template performance score may be obtained for each basetemplate, at 102. A base template from among the multiple base templatesmay be selected based on the template performance score of the basetemplate, at 103. Next, multiple cross-templates may be constructed bygenerating a cross-template of the selected base template and at leastone of the multiple base templates, at 104. The performance of a machinelearning model may be tested based on each of the multiplecross-templates to generate a cross-template performance score for eachof the multiple cross-templates, at 105. A cross-template of themultiple cross-templates may be selected based on the cross-templateperformance score of the cross-template, at 106, and the cross-templatemay be added to the machine learning model, at 107.

A cross-template may be constructed from a combination of templates bygenerating a cross product by crossing all of the features from onetemplate with all of the features from another template. For example, atemplate “country” may be crossed with a template “keyword”. Thetemplate “country” may include the features “United States”, “Canada”,and “France” and the template “keyword” may include the features“books”, “free”, and “dog.” A cross product template “country X keyword”would include the features “United States X books”, “Canada X books”,“France X books”, “United States X free”, “Canada States X free”,“France X free”, “United States X dog”, “Canada X dog”, and “France Xdog”. Each of these features in the cross template may be associatedwith examples in which the feature occurred. For example, a statisticassociated with the feature “United States X books” would be based onexamples in which both features “United States” and “books” werepresent. A cross template may be constructed from any number oftemplates; however, as the number of templates included in a crosstemplate increase, the number of relevant examples may decrease. Forexample, in contrast to the cross template “country X keyword” describedabove, there may be a relatively small number of examples associatedwith a cross template “country X keyword X language X gender X result IDX video ID” since there may be only a few number of examples in whichfeatures from all the templates “country”, “keyword”, “language”,“gender”, “result ID”, and “video ID” occurred. In some cases, across-template may be constructed based on self-crossing of a template.For example, the template “keyword” may include the features “plasma”and “TV”. The system may have received 6 examples including the feature“plasma”, 18 examples including the feature “TV” and among all theseexamples, 2 examples may include both the features “plasma” and “TV”.The template “keyword” may be crossed with the template “keyword” inwhich case the relevant examples would be restricted to examples inwhich 2 or more features from the keyword template are included such asthe 2 examples that included both features “plasma” and “TV”.

Various template performance criteria may be used to generate a templateperformance score. Template performance criteria and techniques forgenerating a template performance score described herein may be used forany type of template, such as a base template, a cross-template, and thelike. Template performance criteria may include, for example, a numberof occurrences associated with features in a specific template, a numberof impressions associated with features in a specific template, and thelike. A number of occurrences may be the number of received trainingexamples in which a specific feature was included. The number ofoccurrence for each feature in a template may be added up to generate anumber of occurrences associated with the template. For example, a“country” template may include the features “United States”, “Canada”,“United Kingdom”, and “France.” The system may have received 26 examplesincluding the feature “United States”, 23 examples including the feature“Canada”, 18 examples including the feature “United Kingdom”, and 11examples including the feature “France.” Thus, the number of occurrencesof features in the “Country” template may be 78. A higher number ofoccurrences of features in a specific template may allow a machinelearning system to make more accurate predictions. As such, a templatemay be assigned a higher scored based on the template having a highernumber of occurrences relative to one or more other templates having alower number of occurrences. Additionally, a rate of occurrence may begenerated for a template based on the number of occurrences of featuresin the template out of a set number of training examples (e.g., over aset time period, over all time, etc.) received by the system. Thisoccurrence rate may also be used as a template performance criterion. Asdescribed above, computing a number of occurrences and/or an occurrencerate may also be performed for a cross-template based on all thefeatures combined in a cross-template.

Another template performance criterion may be a number of impressionsassociated with features in a specific template. As mentioned above, atraining example used for training a machine learning system typicallycontains a label corresponding to a resulting event or action (e.g., auser selected a search result, a user did not select the search result,a user viewed a video, etc.). An impression may refer to a positiveevent or action as indicated by a label included in an example havingone or more features. Referring to the example above, among the 11examples including the feature “France”, 10 of these training examplesmay have a label indicating that a user selected a particular searchresult. Accordingly, the template “country” may be given +10 added tothe number of impressions based on the 10 positively labeled examplesthat included the feature “France”. In this case, 11 occurrences of thefeature “France” may be a low frequency in relation to the number ofoccurrences of the other features “United States”, “Canada”, and “UnitedKingdom”; however, regarding the feature “France”, 10 impressions (e.g.,selections of a particular item by a user) out of 11 occurrences may bea significant signal in a machine learning model for making predictionsassociated with the features in the “country” template. Similar valuesmay be added to the number of impressions associated with the “country”template based on the number of positively labeled examples includingthe features “United States”, “Canada”, and “United Kingdom.” As such,the total number of impressions associated with the features in the“country” template may be used to assign a template performance score tothe “country” template. Accordingly, the number of impressionsassociated with a template or a cross-template may be the total numberof positive events or actions associated with the features in thetemplate, received in training examples. In addition, a rate ofimpressions may be a template performance criterion. A rate ofimpressions may be generated for a template based on the number ofimpressions associated with each of the features in a template out ofthe total number of occurrences associated with each of the features inthe template. As described above, computing a number of impressionsand/or an impression rate may also be performed for a cross-templatebased on all the features combined in a cross-template.

The performance of a machine learning model in making predictions may beused to score the performance of a cross template included in themachine learning model. A machine learning model may be used by themachine learning system to make predictions based on the statisticsgenerated for the features. The statistics generated for the featuresmay help the machine learning system learn the weights for the featureswhich may be part of the model and used to make predictions. Forexample, a model may be generated to predict the likelihood of a userselecting a specific search result for an automotive website. The modelmay contain weights w1, w2, and w3 for features associated with observedproperties including a location being the United States, a preferredlanguage being English, and a keyword ‘automobiles’ in a previous searchquery, respectively. The generated model may be applied to a searchprovider such that when a user conducts a search, the weights areapplied to the features corresponding to the user conducting the search.More specifically, if it is detected that the user is located in theUnited States, prefers English, and has previously searched for“automobiles' then the weights w1, w2, and w3 associated with each ofthe features, respectively, may be used to predict the likelihood of theuser selecting the search result for the automotive website. Based onthe predicted likelihood, the automotive website search result may bepresented to the user. Alternatively, if it is detected that the user isnot located in the United States, but prefers English and has previouslysearched for “automobiles”, then weights w2 and w3 may be used topredict the likelihood of the user selecting the search result for theautomotive website.

In an implementation, a template performance score for a base templateor a cross-template may be based on a degree of improvement ofperformance per feature of the machine learning model including aspecific template relative to performance of the machine learning modelexcluding the specific template. For example, testing the performance ofa machine learning model may be performed by assessing the accuracy ofpredictions made by the system based on the model including a specifictemplate as compared to excluding the specific template. As a specificexample, one or more statistics may be generated based on the crosstemplate “country X keyword”, for example a statistic may be generatedfor the feature “United States X dogs” based on examples in which bothfeatures “United States” and “dogs” were present. As a result, thesystem may generate more accurate predictions related to detectedfeatures “United States X dogs” based on the statistic for “UnitedStates X dogs” as compared to the predictions made based on separatestatistics for “United States” and “dogs.” In general, every feature ina template may have a performance score. One technique may be to sum thescores for every feature and use the sum as the overall performancescore. Another technique may be to divide the sum of the scores forevery feature in a template by the number of features in the template,to obtain a score per feature for the template as a whole.

Similarly, a performance score may also be based on a measure of adegree of improvement of performance of the machine learning modelincluding a specific template relative to performance of the machinelearning model excluding the specific template. In general, there may bean objective used to evaluate the predictions made by a machine learningmodel. Typically, an objective related to the learning algorithm may beused. In some cases, the accuracy of prediction made by a machinelearning system may be assessed by comparing how often the systempredicts a positive event when the event is in fact positive. Forexample, based on prediction made by the system, an automotive websitesearch result may be presented to users 100 times. Out of these 100presentations of the automotive website search result, 58 users may haveselected (i.e., clicked on) the automotive website search result,indicating a positive outcome as a result of presenting the automotivewebsite search result to users. This may indicate that predictions madeby the machine learning model are accurate 58% of the time whenpresenting the automotive website search result. A cross-template suchas “country X keyword” which may include the feature “United States Xautomobiles” may be included in the model. As a result, there may be anincrease to 72% accuracy when presenting the automotive website searchresult to users. The cross-template performance score for thecross-template “country X keyword” may be based on this degree ofimprovement of performance of the machine learning model including thecross-template “country X keyword”. Any other technique for testingperformance of a machine learning model based on a template may be used.Similarly, other performance criteria may be used to generate a templateperformance score for a template.

FIG. 2 shows an example template exploration technique according to animplementation of the disclosed subject matter. In an implementation, amethod may include obtaining multiple base templates 200 such as“country”, “result ID”, “language”, “gender” and “age”. Each of thetemplates “country”, “result ID”, “language”, “gender” and “age” mayinclude multiple features F₁, F₂, . . . F_(n). A template performancescore may be obtained for each base template according to any templateperformance criteria, such as those described above. For example, thebase template “country” may have a score of 88, the base template“result ID” may have a score of 76, the base template “language” mayhave a score of 73, the base template “age” may have a score of 62, andthe base template “gender” may have a score of 56. Based on thesetemplate performance scores, the base template “country” may be selectedbased on it having the highest score of 88 as compared to the other basetemplates. As shown, the base template “country” 201 may be added to themachine learning model 210. Next, it may be advantageous to identifyanother template to cross with the template “country” 201 that mayresult in the model performing better than if the model only includedthe base template “country” 201. As such, at 202, multiplecross-templates may be constructed by generating a cross-template of thebase template “country” and each of the base templates “result ID”,“language”, “age” and “gender”. In particular, a cross-template “countryX result ID” may be generated, a cross-template “country X language” maybe generated, a cross-template “country X age” may be generated, and across-template “country X gender” may be generated.

Next, the performance of a machine learning model may be tested based oneach of the cross-templates to generate a cross-template performancescore for each of cross-templates “country X result ID”, “country Xlanguage”, “country X age”, and “country X gender”. A cross-templateperformance score may be generated for each of the cross-templatesaccording to any cross-template scoring technique(s) such as thosedescribed above. As an example, the degree of improvement of performanceof the machine learning model including each of the cross-templates maybe measured and used to generate a score for each of thecross-templates. An assessment may be made as to which cross-template“country X result ID”, “country X language”, “country X age”, and“country X gender” results in a greater improvement in performance bythe model as compared to performance of the model only including thebase template “country” 201. As a result, the cross template “country Xresult ID” may receive a performance score of 82, the template “countryX language” may receive a performance score of 72, the template “countryX age” may receive a performance score of 63, and the template “countryX gender” may receive a performance score of 43. These cross-templateperformance scores may indicate that the accuracy of predictions made bythe system improved more by including the cross-template “country Xresult ID” (i.e., cross-template performance score of 82) as compared tothe cross-template “country X gender” (i.e., cross-template performancescore of 43). As a result, at 208, the cross-template “country X resultID” may be selected based on it having the highest cross-templateperformance score of 82 as compared to the other cross-templates.

As shown in FIG. 2, based on the selection of the cross-template“country X result ID” 204, this cross-template may also be added to themachine learning model 210 such that the machine learning model 210would include a cross-template 205 that includes the template “country”201 and the cross-template “country X result ID” 204, i.e., “[country],[country X result ID]”. The model 210 then may be used by a trainedmachine learning system to make a prediction and it may be found thatthe performance of the machine learning model 210 improves based on thecross template 205 “[country], [country X result ID]” as compared to theperformance of the model including only one of the base templates“country” or “result ID”.

According to an implementation, the steps described herein may berepeated resulting in additional base templates and/or cross-templatesbeing added to the model 210. For example, multiple cross-templates maybe constructed by generating a cross-template of the cross-template“country X result ID” and each one of the templates “language”, “gender”and “age.” Again, the performance of the machine learning model may betested based on each of the cross-templates to generate a cross-templateperformance score for each of the cross-templates “country X result ID Xlanguage”, country X result ID X gender“, and country X result ID Xage”. Based on the cross-template performance scores, for example, thecross-template “country X result ID X language” may be selected based onit having the highest cross-template performance score. As a result, thecross template “country X result ID X language” may be added to themachine learning model 210. Accordingly, the machine learning model 210then includes the templates “country” 201, the cross-template “country Xresult ID” 204, and the cross template “country X result ID X language”,i.e., “[country], [country X result ID], [country X result ID Xlanguage]”.

Implementations of the disclosed subject matter may be used in machinelearning models that may contain millions of billions of features intemplates. A model based on a single template often is not informativeenough to provide accurate predictions; instead, an aggregate offeatures is more helpful for predictions, as such, it is advantageous toconstruct cross templates that include multiple templates. Sinceexploring the space of all of the 100s of billions of feature templatesis infeasible in such large-scale machine learning systems, it may bedesirable to efficiently explore the space of templates based onestimating the performance gain of a model including a cross-templatethat contains a combination of multiple templates. With each iterationof the techniques described herein, a selection of a template is basedon an assessment of the performance gain of the model with each newtemplate addition. This technique may be used to optimize performance ofa machine learning model by using a greedy strategy for selecting basetemplates and cross-templates to include in the model. As a result, amachine learning system may be able to grow a frontier of templates thatimprove the overall prediction accuracy of the machine learning system.

Embodiments of the presently disclosed subject matter may be implementedin and used with a variety of component and network architectures. FIG.3 is an example computer system 20 suitable for implementing embodimentsof the presently disclosed subject matter. The computer 20 includes abus 21 which interconnects major components of the computer 20, such asone or more processors 24, memory 27 such as RAM, ROM, flash RAM, or thelike, an input/output controller 28, and fixed storage 23 such as a harddrive, flash storage, SAN device, or the like. It will be understoodthat other components may or may not be included, such as a user displaysuch as a display screen via a display adapter, user input interfacessuch as controllers and associated user input devices such as akeyboard, mouse, touchscreen, or the like, and other components known inthe art to use in or in conjunction with general-purpose computingsystems.

The bus 21 allows data communication between the central processor 24and the memory 27. The RAM is generally the main memory into which theoperating system and application programs are loaded. The ROM or flashmemory can contain, among other code, the Basic Input-Output system(BIOS) which controls basic hardware operation such as the interactionwith peripheral components. Applications resident with the computer 20are generally stored on and accessed via a computer readable medium,such as the fixed storage 23 and/or the memory 27, an optical drive,external storage mechanism, or the like.

Each component shown may be integral with the computer 20 or may beseparate and accessed through other interfaces. Other interfaces, suchas a network interface 29, may provide a connection to remote systemsand devices via a telephone link, wired or wireless local- or wide-areanetwork connection, proprietary network connections, or the like. Forexample, the network interface 29 may allow the computer to communicatewith other computers via one or more local, wide-area, or othernetworks, as shown in FIG. 4.

Many other devices or components (not shown) may be connected in asimilar manner, such as document scanners, digital cameras, auxiliary,supplemental, or backup systems, or the like. Conversely, all of thecomponents shown in FIG. 3 need not be present to practice the presentdisclosure. The components can be interconnected in different ways fromthat shown. The operation of a computer such as that shown in FIG. 3 isreadily known in the art and is not discussed in detail in thisapplication. Code to implement the present disclosure can be stored incomputer-readable storage media such as one or more of the memory 27,fixed storage 23, remote storage locations, or any other storagemechanism known in the art.

FIG. 4 shows an example arrangement according to an embodiment of thedisclosed subject matter. One or more clients 10, 11, such as localcomputers, smart phones, tablet computing devices, remote services, andthe like may connect to other devices via one or more networks 7. Thenetwork may be a local network, wide-area network, the Internet, or anyother suitable communication network or networks, and may be implementedon any suitable platform including wired and/or wireless networks. Theclients 10, 11 may communicate with one or more computer systems, suchas processing units 14, databases 15, and user interface systems 13. Insome cases, clients 10, 11 may communicate with a user interface system13, which may provide access to one or more other systems such as adatabase 15, a processing unit 14, or the like. For example, the userinterface 13 may be a user-accessible web page that provides data fromone or more other computer systems. The user interface 13 may providedifferent interfaces to different clients, such as where ahuman-readable web page is provided to web browser clients 10, and acomputer-readable API or other interface is provided to remote serviceclients 11. The user interface 13, database 15, and processing units 14may be part of an integral system, or may include multiple computersystems communicating via a private network, the Internet, or any othersuitable network. Processing units 14 may be, for example, part of adistributed system such as a cloud-based computing system, searchengine, content delivery system, or the like, which may also include orcommunicate with a database 15 and/or user interface 13. In somearrangements, an analysis system 5 may provide back-end processing, suchas where stored or acquired data is pre-processed by the analysis system5 before delivery to the processing unit 14, database 15, and/or userinterface 13. For example, a machine learning system 5 may providevarious prediction models, data analysis, or the like to one or moreother systems 13, 14, 15.

More generally, various embodiments of the presently disclosed subjectmatter may include or be embodied in the form of computer-implementedprocesses and apparatuses for practicing those processes. Embodimentsalso may be embodied in the form of a computer program product havingcomputer program code containing instructions embodied in non-transitoryand/or tangible media, such as CD-ROMs, DVDs, hard drives, USB(universal serial bus) drives, flash drives, or any other machinereadable storage medium, such that when the computer program code isloaded into and executed by a computer, the computer becomes anapparatus for practicing embodiments of the disclosed subject matter.Embodiments also may be embodied in the form of computer program code,for example, whether stored in a storage medium, loaded into and/orexecuted by a computer, or transmitted over some transmission medium,such as over electrical wiring or cabling, through fiber optics, or viaelectromagnetic radiation. When the computer program code is loaded intoand executed by a computer, the computer becomes an apparatus forpracticing embodiments of the disclosed subject matter. When implementedon a general-purpose microprocessor, the computer program code segmentsconfigure the microprocessor to create specific logic circuits. In someconfigurations, a set of computer-readable instructions stored on acomputer-readable storage medium may be implemented by a general-purposeprocessor, which may transform the general-purpose processor or a devicecontaining the general-purpose processor into a special-purpose deviceconfigured to implement or carry out the instructions. Embodiments maybe implemented using hardware that may include a processor, such as ageneral purpose microprocessor and/or an Application Specific IntegratedCircuit (ASIC) that embodies all or part of the techniques according toembodiments of the disclosed subject matter in hardware and/or firmware.The processor may be coupled to memory, such as RAM, ROM, flash memory,a hard disk or any other device capable of storing electronicinformation, as previously described. The memory or other storage mediummay store instructions adapted to be executed by the processor toperform the techniques according to embodiments of the disclosed subjectmatter.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive or tolimit embodiments of the disclosed subject matter to the precise formsdisclosed. Many modifications and variations are possible in view of theabove teachings. The embodiments were chosen and described in order toexplain the principles of embodiments of the disclosed subject matterand their practical applications, to thereby enable others skilled inthe art to utilize those embodiments as well as various embodiments withvarious modifications as may be suited to the particular usecontemplated.

1. A computer-implemented method comprising: obtaining acomputer-implemented machine learning model having features; obtainingthree or more base templates, each base template comprising a pluralityof features, wherein the features of each of the base templates are notfound in any of the other base templates and the features of each of thebase templates are not features of the machine learning model at a timeof obtaining the three or more base templates; obtaining a respectivetemplate performance score for each base template; selecting a firstbase template from the three or more base templates based on the firstbase template having the highest template performance score relative tothe performance scores of the other base templates; constructing a firstplurality of cross-templates by generating a respective cross-templateof the selected first base template and each of the other basetemplates; testing the performance of a computer-implemented machinelearning model based on each of the first plurality of cross-templatesto generate a respective cross-template performance score for each ofthe first plurality of cross-templates; selecting a first cross-templateof the first plurality of cross-templates based on the cross-templateperformance score of the first cross-template; and adding the featuresof the first cross-template to the machine learning model.
 2. The methodof claim 1, further comprising adding the features of the first basetemplate to the machine learning model.
 3. The method of claim 1,wherein the cross-template performance score of the first cross-templatemeasures a degree of improvement of performance per feature of themachine learning model including the first cross-template relative to aperformance of the machine learning model excluding the firstcross-template.
 4. The method of claim 1, wherein the cross-templateperformance score of the first cross-template measures a degree ofimprovement of performance of the machine learning model including thefirst cross-template relative to performance of the machine learningmodel excluding the first cross-template.
 5. The method of claim 1,wherein the template performance score of each base template is based ona degree of performance of a machine learning model including each basetemplate.
 6. The method of claim 1, wherein the performance score ofeach base template is based on a number of occurrences associated witheach feature in a base template.
 7. The method of claim 1, wherein theperformance score of each base template is based on a number ofimpressions associated with each feature in a base template. 8.(canceled)
 9. The method of claim 1, further comprising: constructing asecond plurality of cross-templates by generating a cross-template ofthe first cross-template and each one of the other cross-templates inthe first plurality of cross-templates; and testing the performance of amachine learning model based on each of the second plurality ofcross-templates to generate a cross-template performance score for eachof the second plurality of cross-templates; and selecting a secondcross-template of the second plurality of cross-templates based on thecross-template performance score of the second cross-template.
 10. Themethod of claim 9, further comprising: adding the features of the secondcross-template to the machine learning model.
 11. The method of claim 1,further comprising: training the machine learning model; and using thetrained machine learning model to make a prediction.
 12. Acomputer-implemented machine learning system configured, by instructionsadapted to be executed by the machine learning system to: obtain acomputer-implemented machine learning model having features; obtainthree or more base templates, each base template comprising a pluralityof features, wherein the features of each of the base templates are notfound in any of the other base templates and the features of each of thebase templates are not features of the machine learning model at a timeof obtaining the three or more base templates; obtain a respectivetemplate performance score for each base template; select a first basetemplate from the three or more base templates based on the first basetemplate having the highest performance score relative to theperformance scores of the other base templates; construct a firstplurality of cross-templates by generating a respective cross-templateof the selected first base template and each of the other basetemplates; test the performance of a machine learning model based oneach of the first plurality of cross-templates to generate a respectivecross-template performance score for each of the first plurality ofcross-templates; select a first cross-template of the first plurality ofcross-templates based on the cross-template performance score of thefirst cross-template; and add the features of the first cross-templateto the machine learning model.
 13. The system of claim 12, wherein themachine learning system is further configured to add the features of thefirst base template to the machine learning model.
 14. The system ofclaim 12, wherein the cross-template performance score of the firstcross-template measures a degree of improvement of performance perfeature of the machine learning model including the first cross-templaterelative to a performance of the machine learning model excluding thefirst cross-template.
 15. The system of claim 12, wherein thecross-template performance score of the first cross-template measures adegree of improvement of performance of the machine learning modelincluding the first cross-template relative to performance of themachine learning model excluding the first cross-template.
 16. Thesystem of claim 12, wherein the template performance score of each basetemplate is based on a degree of performance of a machine learning modelincluding each template.
 17. The system of claim 12, wherein theperformance score of each base template is based on a number ofoccurrences associated with each feature in a template.
 18. The systemof claim 12, wherein the performance score of each base template isbased on a number of impressions associated with each feature in atemplate.
 19. (canceled)
 20. The system of claim 12, wherein the machinelearning system is further configured to: construct a second pluralityof cross-templates by generating a cross-template of the firstcross-template and each one of the other cross-templates in the firstplurality of cross-templates; and test the performance of a machinelearning model based on each of the second plurality of cross-templatesto generate a cross-template performance score for each of the secondplurality of cross-templates; and select a second cross-template of thesecond plurality of cross-templates based on the cross-templateperformance score of the second cross-template.
 21. The system of claim20, wherein the machine learning system is further configured to: addthe features of the second cross-template to the machine learning model.22. The system of claim 12, wherein the machine learning system isfurther configures to: train the machine learning model; and use thetrained machine learning model to make a prediction.